TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism
نویسندگان
چکیده
Modern supercomputers have very powerful multi-core CPUs. The programming model on these supercomputer is switching from pure MPI to MPI for inter-node communication, and shared memory and threads for intra-node communication. Consequently the bottleneck in most systems is no longer computation but communication between nodes. In this paper, we present a new compositing algorithm for hybrid MPI parallelism that focuses on communication avoidance and overlapping communication with computation at the expense of evenly balancing the workload. The algorithm has three stages: a direct send stage where nodes are arranged in groups and exchange regions of an image, followed by a tree compositing stage and a gather stage. We compare our algorithm with radix-k and binary-swap from the IceT library in a hybrid OpenMP/MPI setting, show strong scaling results and explain how we generally achieve better performance than these two algorithms.
منابع مشابه
An Effective Load Balancing Scheme for 3D Texture-Based Sort-Last Parallel Volume Rendering on GPU Clusters
We present an adaptive dynamic load balancing scheme for 3D texture based sort-last parallel volume rendering on a PC cluster equipped with GPUs. Our scheme exploits not only task parallelism but also data parallelism during rendering by combining the hierarchical data structures (octree and parallel BSP tree) in order to skip empty regions and distribute proper workloads to rendering nodes. Ou...
متن کاملEstimation of Tree Biomass at Individual tree, Sample plot and Hybrid Level using Drone Images
Two-dimensional image conversion algorithms to 3D data create the hope that the structural properties of trees can be extracted through these images. In this study, the accuracy of biomass estimation in tree, plot, and hybrid levels using UAVs images was investigated. In 34.8 ha of Sisangan Forest Park, using a quadcopter, 854 images from an altitude of 100 meters above ground were acquired. SF...
متن کاملParallel Implementation of Decision Tree Learning Algorithms
In the fields of data mining and machine learning the amount of data available for building classifiers is growing very fast. Therefore, there is a great need for algorithms that are capable of building classifiers from very-large datasets and, simultaneously, being computationally efficient and scalable. One possible solution is to employ parallelism to reduce the amount of time spent in build...
متن کاملShift-Based Parallel Image Compositing on InfiniBandTM Fat-Trees
Parallel image compositing has been widely studied over the past 20 years, as this is one, if not the most, crucial element in the implementation of a scalable parallel rendering system. Many algorithms have been proposed and implemented on a large variety of supercomputers. Among the existing supercomputers, InfiniBandTM (IB) PC clusters, and their associated fat-tree topology, are clearly bec...
متن کاملMassively parallel volume rendering using 2-3 swap image compositing
The ever-increasing amounts of simulation data produced by scientists demand high-end parallel visualization capability. However, image compositing, which requires interprocessor communication, is often the bottleneck stage for parallel rendering of large volume data sets. Existing image compositing solutions either incur a large number of messages exchanged among processors (such as the direct...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015